Not a worry, a discipline — finding how a deployed model fails before its users do
Day 1 of 60
"AI safety" sounds like a feeling — a vague unease about powerful technology. It isn't. It's an engineering discipline with its own methods, metrics, and artifacts. Its job is concrete: given a model and how it's deployed, find the ways it can fail, misbehave, or be misused — before a user, an attacker, or the public finds them for you — and then measure and reduce those failures.
A capable model is not automatically a safe one. Capability and safety are different properties, measured differently and earned separately. The people who can systematically find, measure, and reduce a system's failures are the bottleneck on whether it can be responsibly deployed. This track makes you one of those people.
Over 12 weeks you'll move through three layers of the field: applied safety (the hands-on craft of taxonomies, red-teaming, and evaluation), alignment research literacy (the deeper question of why capable systems can pursue the wrong goal), and governance (the frameworks and regulation that turn testing into accountability). Today we install the map so everything later has a place to hang.
Almost every AI safety concern is one of three types. Conflating them is the most common rookie mistake; a practitioner names which one they're talking about.
The model works as intended, but a human points it at something harmful: generating disallowed content, assisting an attack, harassing someone. The failure is in who's using it and how. Defenses: content policy, refusals, monitoring, access controls.
No bad actor required. The model does something its designers didn't want — a harmful output on a benign prompt, a confidently wrong medical answer, an agent that takes a destructive action. The failure is in the system's own behavior. Defenses: alignment techniques, robustness, evaluation.
No single output is the problem; the aggregate is — concentration of power, erosion of trust, labor effects, an AI race that cuts safety corners. The failure is in the broader system the model is embedded in. Defenses: governance, policy, institutional design.
Misuse is about users. Accidents are about the model. Systemic risk is about the world the model is deployed into. Different causes, different defenses. When someone says "AI is dangerous," your first move is to ask: which of the three?
Ethics tells you what you should want (don't deploy systems that cause harm). It doesn't tell you whether your system does. That gap — between good intentions and verified behavior — is where safety engineering lives. The seminal paper Concrete Problems in AI Safety (Amodei et al., 2016) made this move explicit, reframing safety as a set of tractable engineering problems: avoiding negative side effects, avoiding reward hacking, safe exploration, and robustness to a shifting world.
That reframing is why you can become an AI safety practitioner without training a single neural network. The skills that make systems safer — writing precise policies, running rigorous red-teams, designing honest evaluations, reasoning about failure — are safety skills, not modeling skills.
You don't need to be an ML researcher to do this work. You need to think adversarially, measure honestly, and communicate clearly about risk. Those are learnable, and they're exactly what this track drills.
The single most portable skill in applied safety is threat modeling: given a system, systematically ask what could go wrong, for whom, and how would we catch it? — then rank the answers so you work on what matters. You'll build a real one on Day 3 and reuse the habit for 12 weeks. Today, just internalize the question, because it's the lens the rest of the field is viewed through.
The full curated, verified resource list for this week is at the bottom of the page — start with the ones marked Start here.
An enthusiast says "AI could be dangerous." An expert immediately decomposes: which risk — misuse, accident, or systemic — for whom, and how would we measure it? The altitude jump is from having a concern to having a threat model: a structured, ranked answer you can act on and defend.
Say this in an interview: "I don't treat 'is it safe' as one question. I separate misuse, accidental misbehavior, and systemic harm, because each has different causes, different defenses, and different owners — and I start by threat-modeling the specific deployment, not the technology in the abstract."